Course syllabus
Data Analysis: Statistical Learning and Visualization with Project
Dataanalys: statistisk inlärning och visualisering med projekt
FMSF90, 7.5 credits, G2 (First Cycle)
Valid for: 2024/25
Faculty: Faculty of Engineering LTH
Decided by: PLED I
Date of Decision: 2024-04-16
Effective: 2024-05-08
General Information
Main field: Technology
Depth of study relative to the degree requirements: First cycle, in-depth level of the course cannot be classified
Elective for: C4-adv, F4, Pi4, R4
Language of instruction: The course will be given in English
Aim
The course begins with an overview of basic data wrangling and visualisation. With a focus on the student's ability to identify and illustrate important features of the data.
Then important methods in statistical learning are introduced. Emphasis is given supervised and unsupervised learning. Issues arising from fitting and evaluating multiple models as well as the methods relationship to linear regression are discussed. Computer based labs and projects form an important part of the learning activities.The course concludes with a project where the students will select suitable methods to analyze a given data material.
Learning outcomes
Knowledge and understanding
For a passing grade the student must
- Describe different ways of aggregating, summarising and visualising data.
- Explain the principles of supervised and unsupervised learning.
- Explain the importance of evaluating models based on their predictive ability.
Competences and skills
For a passing grade the student must
- be able to wrangle, present and visualise data to highlight important features in a complex data material.
- be able to use common methods for supervised and unsupervised learning.
- be able to draw conclusions regarding a data material, based on results from classification and regression methods.
- be able to use common method for evaluation of predictive ability on out-of-sample data.
- present the analysis and conclusions of a practical problem in a written report.
Judgement and approach
For a passing grade the student must
- Reflect over the limitations of the chosen model and method, as well as alternative solutions.
- Reflect over the possible issues with fitting multiple models to the same data material.
Contents
- Basic methods for data handling and common visualisation methods for data
- Methods for unsupervised and supervised learning such as: clustering; hierarchical clustering; and regression and decision tree methods for classification and regression problems.
- Methods for model selection and validation such as: bootstrap, split of data into training and test, and cross-validation.
Examination details
Grading scale: TH - (U, 3, 4, 5) - (Fail, Three, Four, Five)
Assessment:
The final grade is determined by the final project. Passing grade on all written lab reports and attendance at half of the scheduled labs.
The examiner, in consultation with Disability Support Services, may deviate from the regular form of examination in order to provide a permanently disabled student with a form of examination equivalent to that of a student without a disability.
Modules
Code: 0124. Name: Computer Lab 1.
Credits: 2.0. Grading scale: UG - (U, G).
Assessment: Reporting of the lab
The module includes: Data handling and visualisation.
Code: 0224. Name: Computer Lab 2.
Credits: 2.0. Grading scale: UG - (U, G).
Assessment: Reporting of the lab
The module includes: Continuous prediction (regression)
Code: 0324. Name: Project.
Credits: 3.5. Grading scale: TH - (U, 3, 4, 5).
Assessment: Written and oral project presentation.
The module includes: Classification and synthesis of the entire course.
Admission
Admission requirements:
- ((FMAA20 Linear Algebra with Introduction to Computer Tools or FMAA21 Linear Algebra with Numerical Applications or FMAB20 Linear Algebra or FMAB22 Linear Algebra)
and
(FMAB30 Calculus in Several Variables or FMAB35 Calculus in Several Variables))
or
(FMSF20 Mathematical Statistics, Basic Course or FMSF25 Mathematical Statistics - Complementary Project or FMSF32 Mathematical Statistics or FMSF45 Mathematical Statistics, Basic Course or FMSF50 Mathematical Statistics, Basic Course or FMSF55 Mathematical Statistics, Basic Course or FMSF70 Mathematical Statistics or FMSF75 Mathematical Statistics, Basic Course or FMSF80 Mathematical Statistics, Basic Course)
Assumed prior knowledge:
A basic course in mathematical statistics and knowledge in linear algebra.
The number of participants is limited to: 50
Selection: Completed university credits within the program. (Note that only credits which according to Ladok have been included in the program before the selection process count. For students taking master's programmes 180 credits corresponding to a bachelor's degree are added.) Priority is given to students enrolled on programmes that include the course in their curriculum. Among these students place is guaranteed to those in the specialisation on Riskmodellering at Risk, säkerhet och krishantering education.
Kursen överlappar följande kurser:
FMSF86
FMAN45
EDAN96
Reading list
- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning with Applications in R. Springer, 2021, ISBN: 978-1071614174. Either the book for Python or R. Available at
https://www.statlearning.com.
- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor: An introduction to statistical learning with applications in Python - Essential Tools for Working with Data. Springer, 2023, ISBN: 3031387465. Either the book for Python or R. Available at https://www.statlearning.com.
- Vanderplas, Jacob T: Python data science handbook : essential tools for working with data. O'Reilly, 2022, ISBN: 1098121228. Side reading.
- Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund: R for Data Science, 2nd Edition. O'Reilly Media, 2023, ISBN: 1492097403. Side reading. Available at https://r4ds.hadley.nz.
Contact
Course coordinator: Linda Hartman,
linda.hartman@matstat.lu.se
Director of studies: Johan Lindström,
studierektor@matstat.lu.se
Course administrator: Susann Nordqvist,
expedition@matstat.lu.se
Course homepage: https://www.maths.lu.se/utbildning/civilingenjoersutbildning/matematisk-statistik-paa-civilingenjoersprogram/
Further information
Given in parallell with FMSF86. Only one of the courses FMSF86 and FMSF90 may be included in a degree. The course overlaps with EDAN96.